Overview

Dataset statistics

Number of variables13
Number of observations1670995
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory495.5 MiB
Average record size in memory311.0 B

Variable types

NUM10
CAT3

Reproduction

Analysis started2020-05-25 10:40:53.484613
Analysis finished2020-05-25 10:43:42.476667
Duration2 minutes and 48.99 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

brand has constant value "Globe Postpaid" Constant
clusterId has a high cardinality: 149 distinct values High cardinality
dataRevenue is highly correlated with dataRevenuePredictedHigh correlation
dataRevenuePredicted is highly correlated with dataRevenueHigh correlation
smsRevenue is highly correlated with smsRevenuePredictedHigh correlation
smsRevenuePredicted is highly correlated with smsRevenueHigh correlation
data is highly skewed (γ1 = 31.9875023) Skewed
sms is highly skewed (γ1 = 126.3236899) Skewed
subsId has unique values Unique
data has 141095 (8.4%) zeros Zeros
voice has 109681 (6.6%) zeros Zeros
sms has 59673 (3.6%) zeros Zeros
dataRevenuePredicted has 219596 (13.1%) zeros Zeros
voiceRevenuePredicted has 222363 (13.3%) zeros Zeros
smsRevenuePredicted has 296058 (17.7%) zeros Zeros
dataRevenue has 219596 (13.1%) zeros Zeros
voiceRevenue has 222363 (13.3%) zeros Zeros
smsRevenue has 296058 (17.7%) zeros Zeros

Variables

subsId
Categorical

UNIQUE

Distinct count1670995
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size12.7 MiB
J41EATby/kOcRuBqMAKDnZ2d/+6auWRl
 
1
J41EATby/kOcRuBtNAT8o5rA+JLQuWRl
 
1
J41EATby/kOcD+hsPkqLnZq3+JrRn2Rl
 
1
J41EATby/kOcRuBqMDuPhpqL3o6buWRl
 
1
J41EATby/kOcD+htMDuPppqy3pKauWRl
 
1
Other values (1670990)
1670990
ValueCountFrequency (%) 
J41EATby/kOcRuBqMAKDnZ2d/+6auWRl1< 0.1%
 
J41EATby/kOcRuBtNAT8o5rA+JLQuWRl1< 0.1%
 
J41EATby/kOcD+hsPkqLnZq3+JrRn2Rl1< 0.1%
 
J41EATby/kOcRuBqMDuPhpqL3o6buWRl1< 0.1%
 
J41EATby/kOcD+htMDuPppqy3pKauWRl1< 0.1%
 
J41EATby/kOcD5NqMAKTgJq13pLRn2Rl1< 0.1%
 
J41EATby/kOcRuBtNDj8hpq13o6buWRl1< 0.1%
 
J41EATby/kOcAN5sPlmDkZ2d+KabuWRl1< 0.1%
 
J41EATby/kOcD+hqNAT8g5qJ7IqauWRl1< 0.1%
 
J41EATby/kOcD+BsMlmThJqL3rKauWRl1< 0.1%
 
J41EATby/kOcD+BsMlmDlJ2d+JrAuWRl1< 0.1%
 
J41EATby/kOcD5NsNDj8nZq3+I6buWRl1< 0.1%
 
J41EATby/kOcRuBsMASLkZqJ7JLQn2Rl1< 0.1%
 
J41EATby/kOcRuBsNAKDgZqI3pqauWRl1< 0.1%
 
J41EATby/kOcAM5tPkqhg5qz7LLAuWRl1< 0.1%
 
J41EATby/kOcRuBsMAeho5qL3o7QuWRl1< 0.1%
 
J41EATby/kOcD+hqNAT8kZqL3o7Rn2Rl1< 0.1%
 
J41EATby/kOcD5NsPk6LkZq3+IrQj2Rl1< 0.1%
 
J41EATby/kOcRuBtPlmTgZqJ7LzQuWRl1< 0.1%
 
J41EATby/kOcAM5tPlmTo5qJ7LybuWRl1< 0.1%
 
J41EATby/kOcRuBtPkr8gZq13o7QuWRl1< 0.1%
 
J41EATby/kOcD+BsMDuPkZqz4+7Rj2Rl1< 0.1%
 
J41EATby/kOcRuBsPlmDppq13rLRj2Rl1< 0.1%
 
J41EATby/kOcAN5tMAShhpqz7I7Bj2Rl1< 0.1%
 
J41EATby/kOcD+BsMlmDgZqJ4+7Qj2Rl1< 0.1%
 
Other values (1670970)1670970> 99.9%
 

Length

Max length32
Median length32
Mean length32
Min length32

Overview of Unicode Properties

Unique unicode characters52
Unique unicode categories (?)5
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
A31618445.9%
 
J26711605.0%
 
R24998404.7%
 
k21424134.0%
 
b21012903.9%
 
l20812443.9%
 
y19541473.7%
 
T19304443.6%
 
q19222243.6%
 
118416003.4%
 
417271523.2%
 
E16729003.1%
 
O16634643.1%
 
c16604443.1%
 
/16295043.0%
 
u12509992.3%
 
D12045722.3%
 
211981642.2%
 
510821922.0%
 
M10641702.0%
 
+10263421.9%
 
N9929951.9%
 
h9819541.8%
 
L9529371.8%
 
s9095141.7%
 
Other values (27)1214833122.7%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter2247536042.0%
 
Lowercase Letter2024084737.9%
 
Decimal Number809978715.1%
 
Other Punctuation16295043.0%
 
Math Symbol10263421.9%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
A316184414.1%
 
J267116011.9%
 
R249984011.1%
 
T19304448.6%
 
E16729007.4%
 
O16634647.4%
 
D12045725.4%
 
M10641704.7%
 
N9929954.4%
 
L9529374.2%
 
B8492113.8%
 
W8191893.6%
 
Q6671993.0%
 
Z6491882.9%
 
P4771102.1%
 
K3884221.7%
 
I3865121.7%
 
S3572951.6%
 
X645190.3%
 
G2389< 0.1%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1184160022.7%
 
4172715221.3%
 
2119816414.8%
 
5108219213.4%
 
785650210.6%
 
383430710.3%
 
83236394.0%
 
62362312.9%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
k214241310.6%
 
b210129010.4%
 
l208124410.3%
 
y19541479.7%
 
q19222249.5%
 
c16604448.2%
 
u12509996.2%
 
h9819544.9%
 
s9095144.5%
 
p7635703.8%
 
r6304603.1%
 
j6062973.0%
 
g5562822.7%
 
t5103932.5%
 
n4822202.4%
 
z3166521.6%
 
o2917521.4%
 
e2697801.3%
 
x2561741.3%
 
a2345191.2%
 
d1766100.9%
 
m1419090.7%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/1629504100.0%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
+1026342100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin4271620779.9%
 
Common1075563320.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
A31618447.4%
 
J26711606.3%
 
R24998405.9%
 
k21424135.0%
 
b21012904.9%
 
l20812444.9%
 
y19541474.6%
 
T19304444.5%
 
q19222244.5%
 
E16729003.9%
 
O16634643.9%
 
c16604443.9%
 
u12509992.9%
 
D12045722.8%
 
M10641702.5%
 
N9929952.3%
 
h9819542.3%
 
L9529372.2%
 
s9095142.1%
 
B8492112.0%
 
W8191891.9%
 
p7635701.8%
 
Q6671991.6%
 
Z6491881.5%
 
r6304601.5%
 
Other values (17)551883512.9%
 

Most frequent Common characters

ValueCountFrequency (%) 
1184160017.1%
 
4172715216.1%
 
/162950415.2%
 
2119816411.1%
 
5108219210.1%
 
+10263429.5%
 
78565028.0%
 
38343077.8%
 
83236393.0%
 
62362312.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII53471840100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
A31618445.9%
 
J26711605.0%
 
R24998404.7%
 
k21424134.0%
 
b21012903.9%
 
l20812443.9%
 
y19541473.7%
 
T19304443.6%
 
q19222243.6%
 
118416003.4%
 
417271523.2%
 
E16729003.1%
 
O16634643.1%
 
c16604443.1%
 
/16295043.0%
 
u12509992.3%
 
D12045722.3%
 
211981642.2%
 
510821922.0%
 
M10641702.0%
 
+10263421.9%
 
N9929951.9%
 
h9819541.8%
 
L9529371.8%
 
s9095141.7%
 
Other values (27)1214833122.7%
 

data
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count1520586
Unique (%)91.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6944.327708413956
Minimum0.0
Maximum2139749.193825725
Zeros141095
Zeros (%)8.4%
Memory size12.7 MiB

Quantile statistics

Minimum0
5-th percentile0
Q1615.6311697
median3182.976384
Q38467.330739
95-th percentile23514.35447
Maximum2139749.194
Range2139749.194
Interquartile range (IQR)7851.69957

Descriptive statistics

Standard deviation18262.48613
Coefficient of variation (CV)2.629842211
Kurtosis1982.225868
Mean6944.327708
Median Absolute Deviation (MAD)3042.76966
Skewness31.9875023
Sum1.160393688e+10
Variance333518399.7
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01410958.4%
 
0.0002613< 0.1%
 
0.0001416< 0.1%
 
0.0003275< 0.1%
 
0.0004272< 0.1%
 
0.0005215< 0.1%
 
0.0006195< 0.1%
 
0.0007166< 0.1%
 
0.001119< 0.1%
 
0.0011113< 0.1%
 
0.0008105< 0.1%
 
0.001499< 0.1%
 
0.000995< 0.1%
 
0.001589< 0.1%
 
0.001284< 0.1%
 
0.000674< 0.1%
 
0.001369< 0.1%
 
0.001860< 0.1%
 
0.001660< 0.1%
 
0.001260< 0.1%
 
0.00259< 0.1%
 
0.000357< 0.1%
 
0.002156< 0.1%
 
0.002353< 0.1%
 
0.002648< 0.1%
 
Other values (1520561)152644891.3%
 
ValueCountFrequency (%) 
01410958.4%
 
4.067632434e-061< 0.1%
 
1.342487536e-051< 0.1%
 
1.516539208e-051< 0.1%
 
1.691178812e-051< 0.1%
 
1.809773927e-051< 0.1%
 
1.832454513e-051< 0.1%
 
2.103087305e-051< 0.1%
 
2.104947636e-051< 0.1%
 
2.263261949e-051< 0.1%
 
ValueCountFrequency (%) 
2139749.1941< 0.1%
 
2075870.4081< 0.1%
 
2074207.2921< 0.1%
 
2056893.4371< 0.1%
 
1925177.2651< 0.1%
 
1837929.9541< 0.1%
 
1805984.7981< 0.1%
 
1798152.121< 0.1%
 
1753827.4611< 0.1%
 
1743709.2751< 0.1%
 

voice
Real number (ℝ≥0)

ZEROS

Distinct count6299
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean274.3947707802836
Minimum0.0
Maximum28011.0
Zeros109681
Zeros (%)6.6%
Memory size12.7 MiB

Quantile statistics

Minimum0
5-th percentile0
Q137
median129
Q3330
95-th percentile990
Maximum28011
Range28011
Interquartile range (IQR)293

Descriptive statistics

Standard deviation481.0275064
Coefficient of variation (CV)1.753049102
Kurtosis163.2474291
Mean274.3947708
Median Absolute Deviation (MAD)111
Skewness8.362652881
Sum458512290
Variance231387.4619
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01096816.6%
 
1156710.9%
 
2133110.8%
 
3117860.7%
 
4107890.6%
 
5103400.6%
 
6100090.6%
 
796570.6%
 
891460.5%
 
991440.5%
 
1087880.5%
 
1186560.5%
 
1385670.5%
 
1284960.5%
 
1482920.5%
 
1581680.5%
 
1681660.5%
 
1878420.5%
 
1776960.5%
 
1976860.5%
 
2176570.5%
 
2075880.5%
 
2274920.4%
 
2474280.4%
 
2373520.4%
 
Other values (6274)134158780.3%
 
ValueCountFrequency (%) 
01096816.6%
 
1156710.9%
 
2133110.8%
 
3117860.7%
 
4107890.6%
 
5103400.6%
 
6100090.6%
 
796570.6%
 
891460.5%
 
991440.5%
 
ValueCountFrequency (%) 
280111< 0.1%
 
278581< 0.1%
 
237651< 0.1%
 
234001< 0.1%
 
220761< 0.1%
 
217761< 0.1%
 
203301< 0.1%
 
201801< 0.1%
 
199141< 0.1%
 
196301< 0.1%
 

sms
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count7971
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean250.01709580220168
Minimum0.0
Maximum536352.0
Zeros59673
Zeros (%)3.6%
Memory size12.7 MiB

Quantile statistics

Minimum0
5-th percentile2
Q123
median72
Q3214
95-th percentile957
Maximum536352
Range536352
Interquartile range (IQR)191

Descriptive statistics

Standard deviation1660.24333
Coefficient of variation (CV)6.640519218
Kurtosis23426.08747
Mean250.0170958
Median Absolute Deviation (MAD)61
Skewness126.3236899
Sum417777317
Variance2756407.913
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0596733.6%
 
4204741.2%
 
1202611.2%
 
3202351.2%
 
2202251.2%
 
6185181.1%
 
5182021.1%
 
8176941.1%
 
7171531.0%
 
9169551.0%
 
10166991.0%
 
12159181.0%
 
11157320.9%
 
14151130.9%
 
13150700.9%
 
15145170.9%
 
16142970.9%
 
17139260.8%
 
18138980.8%
 
19133450.8%
 
20132400.8%
 
21129590.8%
 
22129420.8%
 
23122890.7%
 
24122420.7%
 
Other values (7946)122941873.6%
 
ValueCountFrequency (%) 
0596733.6%
 
1202611.2%
 
2202251.2%
 
3202351.2%
 
4204741.2%
 
5182021.1%
 
6185181.1%
 
7171531.0%
 
8176941.1%
 
9169551.0%
 
ValueCountFrequency (%) 
5363521< 0.1%
 
4418881< 0.1%
 
3421771< 0.1%
 
3405971< 0.1%
 
3376451< 0.1%
 
3244601< 0.1%
 
3150621< 0.1%
 
3138071< 0.1%
 
3029501< 0.1%
 
2809951< 0.1%
 

revenue
Real number (ℝ≥0)

Distinct count8109
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1279.4329318394728
Minimum5.4433
Maximum16496.0
Zeros0
Zeros (%)0.0%
Memory size12.7 MiB

Quantile statistics

Minimum5.4433
5-th percentile499
Q1799
median999
Q31798
95-th percentile2499
Maximum16496
Range16490.5567
Interquartile range (IQR)999

Descriptive statistics

Standard deviation752.1332333
Coefficient of variation (CV)0.5878645255
Kurtosis9.589338837
Mean1279.432932
Median Absolute Deviation (MAD)400
Skewness2.13414743
Sum2137926032
Variance565704.4007
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
99928435717.0%
 
59922861213.7%
 
79920709312.4%
 
179919995712.0%
 
149919246611.5%
 
1299908625.4%
 
2499576243.4%
 
1999334332.0%
 
499330072.0%
 
399259421.6%
 
299189701.1%
 
3799155630.9%
 
2198134730.8%
 
500119220.7%
 
289876230.5%
 
159874910.4%
 
167871530.4%
 
117870920.4%
 
419870100.4%
 
332.7567410.4%
 
192866840.4%
 
5063920.4%
 
88861070.4%
 
189855670.3%
 
112850430.3%
 
Other values (8084)18481111.1%
 
ValueCountFrequency (%) 
5.44331< 0.1%
 
5.45321< 0.1%
 
7.67811< 0.1%
 
8.67074< 0.1%
 
8.67571< 0.1%
 
8.67872< 0.1%
 
9.64162< 0.1%
 
11.90063< 0.1%
 
11.90073< 0.1%
 
11.90644< 0.1%
 
ValueCountFrequency (%) 
164961< 0.1%
 
15998.013< 0.1%
 
159982< 0.1%
 
13417.67191< 0.1%
 
12643.57452< 0.1%
 
12478.641< 0.1%
 
112931< 0.1%
 
109981< 0.1%
 
10837.35391< 0.1%
 
10720.58171< 0.1%
 

brand
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size12.7 MiB
Globe Postpaid
1670995
ValueCountFrequency (%) 
Globe Postpaid1670995100.0%
 

Length

Max length14
Median length14
Mean length14
Min length14

Overview of Unicode Properties

Unique unicode characters13
Unique unicode categories (?)3
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
o334199014.3%
 
G16709957.1%
 
l16709957.1%
 
b16709957.1%
 
e16709957.1%
 
16709957.1%
 
P16709957.1%
 
s16709957.1%
 
t16709957.1%
 
p16709957.1%
 
a16709957.1%
 
i16709957.1%
 
d16709957.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter1838094578.6%
 
Uppercase Letter334199014.3%
 
Space Separator16709957.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
G167099550.0%
 
P167099550.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
o334199018.2%
 
l16709959.1%
 
b16709959.1%
 
e16709959.1%
 
s16709959.1%
 
t16709959.1%
 
p16709959.1%
 
a16709959.1%
 
i16709959.1%
 
d16709959.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
1670995100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin2172293592.9%
 
Common16709957.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
o334199015.4%
 
G16709957.7%
 
l16709957.7%
 
b16709957.7%
 
e16709957.7%
 
P16709957.7%
 
s16709957.7%
 
t16709957.7%
 
p16709957.7%
 
a16709957.7%
 
i16709957.7%
 
d16709957.7%
 

Most frequent Common characters

ValueCountFrequency (%) 
1670995100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII23393930100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
o334199014.3%
 
G16709957.1%
 
l16709957.1%
 
b16709957.1%
 
e16709957.1%
 
16709957.1%
 
P16709957.1%
 
s16709957.1%
 
t16709957.1%
 
p16709957.1%
 
a16709957.1%
 
i16709957.1%
 
d16709957.1%
 

clusterId
Categorical

HIGH CARDINALITY

Distinct count149
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size12.7 MiB
nonOutlier_197
 
56126
nonOutlier_255
 
38374
nonOutlier_209
 
35507
nonOutlier_297
 
30716
nonOutlier_161
 
27623
Other values (144)
1482649
ValueCountFrequency (%) 
nonOutlier_197561263.4%
 
nonOutlier_255383742.3%
 
nonOutlier_209355072.1%
 
nonOutlier_297307161.8%
 
nonOutlier_161276231.7%
 
nonOutlier_85251661.5%
 
nonOutlier_20200071.2%
 
nonOutlier_53198691.2%
 
nonOutlier_185189181.1%
 
nonOutlier_108183911.1%
 
nonOutlier_205183541.1%
 
nonOutlier_62171131.0%
 
nonOutlier_158169451.0%
 
nonOutlier_191168941.0%
 
nonOutlier_30164371.0%
 
nonOutlier_199164311.0%
 
outlier_205.15161931.0%
 
nonOutlier_39158961.0%
 
nonOutlier_160157060.9%
 
nonOutlier_129156290.9%
 
nonOutlier_68154210.9%
 
nonOutlier_169152690.9%
 
nonOutlier_192151770.9%
 
nonOutlier_145146920.9%
 
nonOutlier_289146690.9%
 
Other values (124)113947268.2%
 

Length

Max length16
Median length14
Mean length13.95216503
Min length12

Overview of Unicode Properties

Unique unicode characters21
Unique unicode categories (?)5
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
n302308813.0%
 
o16709957.2%
 
u16709957.2%
 
t16709957.2%
 
l16709957.2%
 
i16709957.2%
 
e16709957.2%
 
r16709957.2%
 
_16709957.2%
 
O15115446.5%
 
111578625.0%
 
29399054.0%
 
55927962.5%
 
05123562.2%
 
.3723411.6%
 
83701981.6%
 
93511401.5%
 
73361001.4%
 
32973331.3%
 
62710721.2%
 
42103030.9%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter1472005363.1%
 
Decimal Number503906521.6%
 
Connector Punctuation16709957.2%
 
Uppercase Letter15115446.5%
 
Other Punctuation3723411.6%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
n302308820.5%
 
o167099511.4%
 
u167099511.4%
 
t167099511.4%
 
l167099511.4%
 
i167099511.4%
 
e167099511.4%
 
r167099511.4%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
O1511544100.0%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_1670995100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1115786223.0%
 
293990518.7%
 
559279611.8%
 
051235610.2%
 
83701987.3%
 
93511407.0%
 
73361006.7%
 
32973335.9%
 
62710725.4%
 
42103034.2%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.372341100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin1623159769.6%
 
Common708240130.4%
 

Most frequent Latin characters

ValueCountFrequency (%) 
n302308818.6%
 
o167099510.3%
 
u167099510.3%
 
t167099510.3%
 
l167099510.3%
 
i167099510.3%
 
e167099510.3%
 
r167099510.3%
 
O15115449.3%
 

Most frequent Common characters

ValueCountFrequency (%) 
_167099523.6%
 
1115786216.3%
 
293990513.3%
 
55927968.4%
 
05123567.2%
 
.3723415.3%
 
83701985.2%
 
93511405.0%
 
73361004.7%
 
32973334.2%
 
62710723.8%
 
42103033.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII23313998100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
n302308813.0%
 
o16709957.2%
 
u16709957.2%
 
t16709957.2%
 
l16709957.2%
 
i16709957.2%
 
e16709957.2%
 
r16709957.2%
 
_16709957.2%
 
O15115446.5%
 
111578625.0%
 
29399054.0%
 
55927962.5%
 
05123562.2%
 
.3723411.6%
 
83701981.6%
 
93511401.5%
 
73361001.4%
 
32973331.3%
 
62710721.2%
 
42103030.9%
 

dataRevenuePredicted
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct count1448178
Unique (%)86.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean550.3867643952545
Minimum0.0
Maximum23707.17016910278
Zeros219596
Zeros (%)13.1%
Memory size12.7 MiB

Quantile statistics

Minimum0
5-th percentile0
Q111.9288669
median225.2723017
Q3969.2987249
95-th percentile1827.736836
Maximum23707.17017
Range23707.17017
Interquartile range (IQR)957.369858

Descriptive statistics

Standard deviation666.0653108
Coefficient of variation (CV)1.210176832
Kurtosis7.784348364
Mean550.3867644
Median Absolute Deviation (MAD)225.2723017
Skewness1.610628698
Sum919693531.4
Variance443642.9983
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
021959613.1%
 
5.101499129e-0655< 0.1%
 
5.793866844e-0647< 0.1%
 
1.65990438e-0634< 0.1%
 
0.000287798003531< 0.1%
 
1.079804646e-0530< 0.1%
 
8.579106278e-0628< 0.1%
 
9.198047422e-0624< 0.1%
 
2.896933422e-0624< 0.1%
 
3.875707352e-0624< 0.1%
 
8.299521901e-0623< 0.1%
 
7.751414703e-0623< 0.1%
 
2.550749564e-0621< 0.1%
 
0.000431697005321< 0.1%
 
3.319808761e-0621< 0.1%
 
0.000575596007121< 0.1%
 
0.000143899001820< 0.1%
 
1.682338549e-0519< 0.1%
 
2.255225223e-0517< 0.1%
 
1.161933066e-0516< 0.1%
 
0.000719495008916< 0.1%
 
9.959426282e-0615< 0.1%
 
0.000863394010615< 0.1%
 
4.979713141e-0615< 0.1%
 
1.127612611e-0515< 0.1%
 
Other values (1448153)145082486.8%
 
ValueCountFrequency (%) 
021959613.1%
 
5.678980783e-081< 0.1%
 
1.03237192e-071< 0.1%
 
1.054433649e-071< 0.1%
 
1.069784889e-071< 0.1%
 
1.079303859e-071< 0.1%
 
1.188604592e-071< 0.1%
 
1.19553824e-071< 0.1%
 
1.360401498e-071< 0.1%
 
1.414507668e-071< 0.1%
 
ValueCountFrequency (%) 
23707.170171< 0.1%
 
20535.529641< 0.1%
 
17426.409021< 0.1%
 
15586.536471< 0.1%
 
15560.793311< 0.1%
 
14677.394321< 0.1%
 
14606.30311< 0.1%
 
14418.291021< 0.1%
 
14289.834021< 0.1%
 
14120.586771< 0.1%
 

voiceRevenuePredicted
Real number (ℝ≥0)

ZEROS

Distinct count52394
Unique (%)3.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean418.36403773044697
Minimum0.0
Maximum14947.934872527232
Zeros222363
Zeros (%)13.3%
Memory size12.7 MiB

Quantile statistics

Minimum0
5-th percentile0
Q123.09784958
median211.7424847
Q3649.944579
95-th percentile1412.76486
Maximum14947.93487
Range14947.93487
Interquartile range (IQR)626.8467295

Descriptive statistics

Standard deviation557.6118981
Coefficient of variation (CV)1.332838982
Kurtosis25.09425328
Mean418.3640377
Median Absolute Deviation (MAD)211.7424847
Skewness3.296263002
Sum699084215.2
Variance310931.0289
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
022236313.3%
 
44.35234549811< 0.1%
 
88.70469098598< 0.1%
 
4.802503189500< 0.1%
 
133.0570365478< 0.1%
 
1.93199149468< 0.1%
 
15.73754913453< 0.1%
 
9.605006377430< 0.1%
 
39.53575462430< 0.1%
 
3.86398298425< 0.1%
 
12.54556965414< 0.1%
 
236.063237412< 0.1%
 
0.9478114819408< 0.1%
 
221.7617275383< 0.1%
 
31.47509826380< 0.1%
 
177.409382379< 0.1%
 
283.2758844376< 0.1%
 
14.40750957374< 0.1%
 
13.63694921373< 0.1%
 
8.917066885366< 0.1%
 
5.795974471366< 0.1%
 
299.0134335365< 0.1%
 
267.5383352358< 0.1%
 
251.8007861356< 0.1%
 
7.727965961351< 0.1%
 
Other values (52369)143837886.1%
 
ValueCountFrequency (%) 
022236313.3%
 
0.06859984581141< 0.1%
 
0.1278676012163< 0.1%
 
0.1371996916147< 0.1%
 
0.171281227976< 0.1%
 
0.1735838188154< 0.1%
 
0.1929535719159< 0.1%
 
0.19514361765< 0.1%
 
0.2057995374114< 0.1%
 
0.2557352024163< 0.1%
 
ValueCountFrequency (%) 
14947.934871< 0.1%
 
14816.041331< 0.1%
 
13020.823661< 0.1%
 
12954.876891< 0.1%
 
12947.549471< 0.1%
 
12885.266411< 0.1%
 
12808.328511< 0.1%
 
12793.673671< 0.1%
 
12782.682541< 0.1%
 
12746.045451< 0.1%
 

smsRevenuePredicted
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct count61312
Unique (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean228.1703653534256
Minimum0.0
Maximum22928.001073226926
Zeros296058
Zeros (%)17.7%
Memory size12.7 MiB

Quantile statistics

Minimum0
5-th percentile0
Q14.945962975
median46.27551601
Q3296.8329611
95-th percentile1019.085424
Maximum22928.00107
Range22928.00107
Interquartile range (IQR)291.8869982

Descriptive statistics

Standard deviation374.9456973
Coefficient of variation (CV)1.643270793
Kurtosis26.76355773
Mean228.1703654
Median Absolute Deviation (MAD)46.27551601
Skewness2.774820483
Sum381271539.7
Variance140584.2759
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
029605817.7%
 
11.39302417797< 0.1%
 
36.87243716751< 0.1%
 
22.78604833675< 0.1%
 
3.485800936632< 0.1%
 
45.57209666631< 0.1%
 
73.74487431629< 0.1%
 
110.6173115616< 0.1%
 
34.1790725614< 0.1%
 
10.45740281600< 0.1%
 
147.4897486599< 0.1%
 
6.971601872582< 0.1%
 
4.146788403558< 0.1%
 
24.66709525545< 0.1%
 
16.58715361534< 0.1%
 
13.94320374533< 0.1%
 
12.44036521532< 0.1%
 
8.293576807523< 0.1%
 
3.33051803493< 0.1%
 
49.68866596491< 0.1%
 
56.96512083481< 0.1%
 
15.27311281470< 0.1%
 
17.42900468469< 0.1%
 
68.358145465< 0.1%
 
20.73394202456< 0.1%
 
Other values (61287)136126181.5%
 
ValueCountFrequency (%) 
029605817.7%
 
0.000389903558625< 0.1%
 
0.000779807117338< 0.1%
 
0.00116971067636< 0.1%
 
0.00155961423548< 0.1%
 
0.00194951779355< 0.1%
 
0.00233942135247< 0.1%
 
0.00272932491157< 0.1%
 
0.00311922846955< 0.1%
 
0.00350913202846< 0.1%
 
ValueCountFrequency (%) 
22928.001071< 0.1%
 
21772.073761< 0.1%
 
16549.916741< 0.1%
 
14500.751021< 0.1%
 
14387.067821< 0.1%
 
10486.571911< 0.1%
 
7911.6762021< 0.1%
 
7495.2103961< 0.1%
 
6715.2462591< 0.1%
 
6676.0180711< 0.1%
 

dataRevenue
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct count1371360
Unique (%)82.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean583.9378945965124
Minimum0.0
Maximum12643.5745
Zeros219596
Zeros (%)13.1%
Memory size12.7 MiB

Quantile statistics

Minimum0
5-th percentile0
Q114.11029907
median262.4046133
Q3999
95-th percentile1816.452797
Maximum12643.5745
Range12643.5745
Interquartile range (IQR)984.8897009

Descriptive statistics

Standard deviation689.0691117
Coefficient of variation (CV)1.180038353
Kurtosis2.394431278
Mean583.9378946
Median Absolute Deviation (MAD)262.4046133
Skewness1.338096967
Sum975757302.2
Variance474816.2407
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
021959613.1%
 
999270271.6%
 
599139280.8%
 
799104970.6%
 
149963890.4%
 
129943050.3%
 
49920920.1%
 
179918170.1%
 
29917140.1%
 
5013500.1%
 
2499691< 0.1%
 
1128584< 0.1%
 
399500< 0.1%
 
3799469< 0.1%
 
1999463< 0.1%
 
598462< 0.1%
 
1098371< 0.1%
 
300347< 0.1%
 
1178319< 0.1%
 
500277< 0.1%
 
898240< 0.1%
 
928191< 0.1%
 
1598178< 0.1%
 
888167< 0.1%
 
1298164< 0.1%
 
Other values (1371335)137685782.4%
 
ValueCountFrequency (%) 
021959613.1%
 
4.033344825e-081< 0.1%
 
8.365554269e-081< 0.1%
 
8.891543244e-081< 0.1%
 
9.538970373e-081< 0.1%
 
1.008753072e-071< 0.1%
 
1.050391029e-071< 0.1%
 
1.159086231e-071< 0.1%
 
1.194923765e-071< 0.1%
 
1.242036327e-071< 0.1%
 
ValueCountFrequency (%) 
12643.57452< 0.1%
 
10204.846181< 0.1%
 
9352.96871< 0.1%
 
8533.6230121< 0.1%
 
8460.5984161< 0.1%
 
83988< 0.1%
 
8239.7459051< 0.1%
 
81281< 0.1%
 
8079.9170431< 0.1%
 
8062.90611< 0.1%
 

voiceRevenue
Real number (ℝ≥0)

ZEROS

Distinct count1416187
Unique (%)84.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean447.4594534052984
Minimum0.0
Maximum15574.880911502554
Zeros222363
Zeros (%)13.3%
Memory size12.7 MiB

Quantile statistics

Minimum0
5-th percentile0
Q124.43840241
median235.317158
Q3671.9352854
95-th percentile1485.452624
Maximum15574.88091
Range15574.88091
Interquartile range (IQR)647.496883

Descriptive statistics

Standard deviation597.935246
Coefficient of variation (CV)1.336289225
Kurtosis16.34980047
Mean447.4594534
Median Absolute Deviation (MAD)234.6116059
Skewness2.987280169
Sum747702509.3
Variance357526.5584
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
022236313.3%
 
99982440.5%
 
59919730.1%
 
79913700.1%
 
399601< 0.1%
 
499461< 0.1%
 
299307< 0.1%
 
332.75239< 0.1%
 
1178222< 0.1%
 
500133< 0.1%
 
1098127< 0.1%
 
1799122< 0.1%
 
30092< 0.1%
 
112879< 0.1%
 
5074< 0.1%
 
149974< 0.1%
 
119873< 0.1%
 
108864< 0.1%
 
89861< 0.1%
 
133.160< 0.1%
 
129953< 0.1%
 
88847< 0.1%
 
347.753783645< 0.1%
 
68840< 0.1%
 
384.921330338< 0.1%
 
Other values (1416162)143403385.8%
 
ValueCountFrequency (%) 
022236313.3%
 
0.012618804841< 0.1%
 
0.021317560061< 0.1%
 
0.037912720751< 0.1%
 
0.054525178861< 0.1%
 
0.054810752361< 0.1%
 
0.055072325111< 0.1%
 
0.055741093861< 0.1%
 
0.055949371731< 0.1%
 
0.055996236451< 0.1%
 
ValueCountFrequency (%) 
15574.880911< 0.1%
 
15207.388761< 0.1%
 
14602.417451< 0.1%
 
13268.421661< 0.1%
 
12279.34861< 0.1%
 
10181.255131< 0.1%
 
10002.6991< 0.1%
 
9743.0773491< 0.1%
 
9738.4515321< 0.1%
 
9498.49331< 0.1%
 

smsRevenue
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct count1316006
Unique (%)78.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean248.0355838376625
Minimum0.0
Maximum9566.834161612367
Zeros296058
Zeros (%)17.7%
Memory size12.7 MiB

Quantile statistics

Minimum0
5-th percentile0
Q15.213030217
median49.63484142
Q3332.75
95-th percentile1114.186221
Maximum9566.834162
Range9566.834162
Interquartile range (IQR)327.5369698

Descriptive statistics

Standard deviation395.1969905
Coefficient of variation (CV)1.593307639
Kurtosis6.877999462
Mean248.0355838
Median Absolute Deviation (MAD)49.63484142
Skewness2.272678728
Sum414466220.4
Variance156180.6613
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
029605817.7%
 
59987310.5%
 
79958630.4%
 
99954310.3%
 
149943390.3%
 
179925920.2%
 
29918900.1%
 
129918290.1%
 
49916420.1%
 
39912800.1%
 
509410.1%
 
332.759160.1%
 
500592< 0.1%
 
2499588< 0.1%
 
133.1376< 0.1%
 
888338< 0.1%
 
1999334< 0.1%
 
1178272< 0.1%
 
1678247< 0.1%
 
688219< 0.1%
 
698205< 0.1%
 
898172< 0.1%
 
349137< 0.1%
 
1098132< 0.1%
 
2198125< 0.1%
 
Other values (1315981)133574679.9%
 
ValueCountFrequency (%) 
029605817.7%
 
0.00025301919291< 0.1%
 
0.00030307886621< 0.1%
 
0.0003561854381< 0.1%
 
0.00036104803491< 0.1%
 
0.00036290591391< 0.1%
 
0.00037579776771< 0.1%
 
0.00038190278172< 0.1%
 
0.00038963929711< 0.1%
 
0.00039035818831< 0.1%
 
ValueCountFrequency (%) 
9566.8341621< 0.1%
 
83981< 0.1%
 
8195.2859921< 0.1%
 
7981.054181< 0.1%
 
7863.1305121< 0.1%
 
7224.8191791< 0.1%
 
7016.0641251< 0.1%
 
6976.67261< 0.1%
 
6247.5992341< 0.1%
 
5950.9433021< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

subsIddatavoicesmsrevenuebrandclusterIddataRevenuePredictedvoiceRevenuePredictedsmsRevenuePredicteddataRevenuevoiceRevenuesmsRevenue
0J41EATby/kOcD+hsMASxhp2d+JLRj2Rl6317.89133527.07.01499.0Globe PostpaidnonOutlier_1271286.3917974.6867631.6837001491.6132405.4344551.952305
1J41EATby/kOcD5NtMAT8nZqy3rzRj2Rl1947.33695323.070.01078.0Globe PostpaidnonOutlier_192222.61690141.587835289.731403433.22867480.932950563.838375
2J41EATby/kOcD+hsPk6LppqI3pqauWRl5221.555423549.088.0599.0Globe PostpaidnonOutlier_205146.483612576.9065605.343544120.405687474.2020604.392253
3J41EATby/kOcD5NsNASLnZq3/+7Qn2Rl7572.696437228.033.01799.0Globe PostpaidnonOutlier_121417.089260821.8813659.6209581133.751277657.5514147.697309
4J41EATby/kOcAM5qMAKTkZqz7JLRn2Rl1453.581084183.033.01977.0Globe PostpaidnonOutlier_11147.5513891133.50967499.977322211.2244671622.654844143.120689
5J41EATby/kOcD5NsNASLo52d+JbQuWRl23480.104545163.077.01799.0Globe PostpaidnonOutlier_2691558.53122259.90391418.3739271712.96563065.83977620.194594
6J41EATby/kOcD+hqNDuPnZqJ7JrQuWRl0.0000000.074.0999.0Globe PostpaidnonOutlier_1430.0000000.000000750.0412990.0000000.000000999.000000
7J41EATby/kOcAM5sMkqLlJ2d+JrRj2Rl1945.774417432.0470.0999.0Globe PostpaidnonOutlier_20956.367789288.085401436.27538072.126759368.626596558.246644
8J41EATby/kOcD+BsNASxhp2Z7LLQj2Rl12504.552249605.0216.0799.0Globe PostpaidnonOutlier_161731.073833357.694142142.436250474.436312232.12852492.435163
9J41EATby/kOcD5NqMAKDkZqL3rzRj2Rl0.000000155.045.0799.0Globe PostpaidnonOutlier_2080.000000701.27978544.7268680.000000751.09591347.904087

Last rows

subsIddatavoicesmsrevenuebrandclusterIddataRevenuePredictedvoiceRevenuePredictedsmsRevenuePredicteddataRevenuevoiceRevenuesmsRevenue
1670985J41EATby/kOcD+hsMAehhprA+JLAuWRl4386.5556501658.08807.01799.0000Globe Postpaidoutlier_205.858.6283661217.277307479.25050860.0929031247.684907491.222190
1670986J41EATby/kOcRuBqNASLg5rA+I6auWRl0.0327008.01282.0599.0000Globe Postpaidoutlier_205.12.20.00057914.135512106.0508130.00288570.450035528.547080
1670987J41EATby/kOcD+htMAT8gJ2Z7JLQj2Rl56957.55702031.031.01499.0000Globe Postpaidoutlier_205.7.01243.85649623.4313627.0916801463.09700727.5613448.341650
1670988J41EATby/kOcRuBtMkr8ppqI3pbRj2Rl43716.364295892.0143.03799.0000Globe Postpaidoutlier_205.18.11623.5411213605.5124981.0092321179.3037472618.9631690.733084
1670989J41EATby/kOcRuBqMAKTppq13rzRj2Rl1957.039154779.01777.0799.0000Globe Postpaidoutlier_205.1711.530992638.9522316.46723814.024288777.1100917.865621
1670990J41EATby/kOcAM5sNASLhpq3+I7QqWRl3549.967787563.0152.05398.0000Globe Postpaidoutlier_205.1268.2705202062.6684649.937130618.6249064756.46033022.914765
1670991J41EATby/kOcD+htMAehhJ2Z7LLQj2Rl154740.9374290.00.0999.0000Globe Postpaidoutlier_205.5.0270.4013250.0000000.000000999.0000000.0000000.000000
1670992J41EATby/kOcAN5sPkr8gJqJ7LzAuWRl3423.8904401650.01020.01499.0000Globe Postpaidoutlier_205.845.7618951211.40383455.50533952.2576321383.35824763.384122
1670993J41EATby/kOcRuBqMASLhpqI3pqauWRl20790.1048656.01.04917.3842Globe Postpaidoutlier_205.11571.10502621.9822570.0653764849.33254167.8498720.201787
1670994J41EATby/kOcD5NqMDuPppqL3rLRj2Rl1872.3291911441.0108.0599.0000Globe Postpaidoutlier_205.1523.897468519.8177500.83704226.286886571.7923790.920735